Term-Weighting Approaches in Automatic Text Retrieval
نویسندگان
چکیده
The experimental evidence accumulated over the past 20 years indicates that text indexing systems based on the assignment of appropriately weighted single terms produce retrieval results that are superior to those obtainable with other more elaborate text representations. These results depend crucially on the choice of effective termweighting systems. This article summarizes the insights gained in automatic term weighting, and provides baseline single-term-indexing models with which other more elaborate content analysis procedures can be compared. 1. AUTOMATIC TEXT ANALYSIS In the late 195Os, Luhn [l] first suggested that automatic text retrieval systems could be designed based on a comparison of content identifiers attached both to the stored texts and to the users’ information queries. Typically, certain words extracted from the texts of documents and queries would be used for content identification; alternatively, the content representations could be chosen manually by trained indexers familiar with the subject areas under consideration and with the contents of the document collections. In either case, the documents would be represented by term vectors of the form D= (ti,tj,...ytp) (1) where each tk identifies a content term assigned to some sample document D. Analogously, the information requests, or queries, would be represented either in vector form, or in the form of Boolean statements. Thus, a typical query Q might be formulated as Q = (qa,qbr.. . ,4r) (2)
منابع مشابه
Semiautomatic Image Retrieval Using the High Level Semantic Labels
Content-based image retrieval and text-based image retrieval are two fundamental approaches in the field of image retrieval. The challenges related to each of these approaches, guide the researchers to use combining approaches and semi-automatic retrieval using the user interaction in the retrieval cycle. Hence, in this paper, an image retrieval system is introduced that provided two kind of qu...
متن کاملThe Effect of Term Importance Degree on Text Retrieval
Various approaches to index term-weighting have been investigated. In fact, term-weighting is an indispensable process for document ranking in most retrieval systems. As well actual information retrieval systems have to deal with explosive growth of documents of various sizes and terms of various frequencies because an appropriate term-weighting scheme has a crucial impact on the overall perfor...
متن کاملTerm-Weighting for Summarization of Multi-party Spoken Dialogues
This paper explores the issue of term-weighting in the genre of spontaneous, multi-party spoken dialogues, with the intent of using such term-weights in the creation of extractive meeting summaries. The field of text information retrieval has yielded many term-weighting techniques to import for our purposes; this paper implements and compares several of these, namely tf.idf, Residual IDF and Ga...
متن کاملA New Algorithm for Term Weighting in Text Summarization Process
The importance of good weighting methodology in information retrieval methods – the method that affects the most useful features of a document or query representative is examined. Good weighting methodologies are supposed to be more important than the feature selection process. Weighting features is the thing that many information retrieval systems are regarding as being of minor importance as ...
متن کاملImproving automatic bug assignment using time-metadata in term-weighting
Assigning newly reported bugs to project developers is a time-consuming and tedious task for triagers using the traditional manual bug triage process. Previous efforts for creating automatic bug assignment systems use machine learning and information-retrieval techniques. These approaches commonly use tf-idf, a statistical computation technique for weighting terms based on term frequency. Howev...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Process. Manage.
دوره 24 شماره
صفحات -
تاریخ انتشار 1988